Nonlinear Noise Reduction Scheme Based on Information Flow 
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We present a measurement noise reduction scheme based on information flow of a chaotic system. 
This scheme operates on conditions of chaoticity and well-defined noise level, not depending on 
other detailed characteristics of noise. Starting with a simple map and full knowledge of dynamics, 
we extend the basic idea to general form applicable to higher dimensional systems. Reducing noise 
in Lorenz system is demonstrated as an example. Inferring dynamics without a priori knowledge is 
then discussed by proposing an indicator which measures predictability. 

PACS numbers: 05.45.-a, 05.40.Ca, 89.70.+C 



It has been of great importance in communication 
and experimental research how to filter off noisy parts 
from the signal. As the broad-band spectrum of signals 
from nonlinear chaotic systems usually makes traditional 
linear filters unfeasible, many researchers have studied 
noise reduction methods applicable to nonlinear systems 
[1 H H H H H, H, H H . It is widely known that there 
are two kinds of noise : measurement noise means cor- 
ruption of data in observation process without interfering 
dynamics itself, while dynamical noise denotes the per- 
turbation of the system coupled to dynamics, occurring 
at each time step. The noise reduction problem is quite 
different for each case and we treat measurement noise 
in this paper. There exists a true orbit {Yk}^^i satisfy- 
ing certain dynamics Yfc+i — M{Yk) for 1 < fc < iV — 1, 
while one observes only a noisy orbit {Xk}^^i given by 
Xk = Yk -\- rjk for small \rik\ < 6, where rjk and S denotes 
noise and noise level, respectively. We would like to ob- 
tain a less noisy orbit {-^fej^Li: ^'^d most approaches 
take this problem through minimizing a target function 
with constraints, such as 
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where Xk is a Lagrangian multiplier Minimizing S 
corresponds to maximizing likelihood function P within 
a time interval [t — a,t + P] : 
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where dM is the derivative of M, under the assump- 
tion that the sequence {rjk\ is independently Gaussian 
distributed with standard deviation tr. Those probabil- 
ity distributions of position at different times are trans- 
ported to a particular time, distorted by chaotic dynam- 
ics M , and the true data point is restricted to their in- 
tersection. Thus maximum of joint probability function 
P estimates the position of true data point at that time. 
We shall discuss how this calculation is simplified if we 
consider information aspects as in communication area. 



Studies on communication using chaos 13, 0, ^| 
has been carried out from the understanding of chaos 
control [l^ll4l | and chaos synchronization ^15]. The main 
issues in this field are how to encode information using 
chaotic signal with dynamics already known to both of 
transmitter and receiver, and how to build a system per- 
sistent from noise occurring in communication channel, 
which corresponds to measurement noise. Rosa et al. [l^ 
illustrated a filtering method using 2x mod 1 map. This 
method, which will be called Rosa's method, is described 
as following : one picks a point {Xt, Xt+i) and executes 
backward iteration on Xt+i resulting in two preimages 



X^'^^^ and one of which closest to Xt is selected 

as a filtered point of time t. This filter shrinks noise 
by a factor of two (i.e. Lyapunov exponent of the map) 
at each iteration. Andreyev et al. 17] investigated infor- 
mation aspects and applications of Rosa's method. They, 
however, only treated basically 1-dimensional maps since 
they had to operate inverse mapping directly. 

Maximum likelihood method and Rosa's method are 
actually identical although the former originates from 
the topological distortion j3| and the latter from infor- 
mation property. In a viewpoint of information theory 
E(tI | , a chaotic system itself is interpreted as an ac- 
tive processor of information |21j| . Supposing we have a 
measuring tool with finitely limited resolution, stretch- 
ing process reveals the initial state impossible to identify 
with the tool at that time more precisely. If only stretch- 
ing process exists, the occupied areas in state space, i.e. 
the energy of the system diverges to infinity as the preci- 
sion increases infinitely, as Brillouin claimed in Ref. 2(J| . 
Folding process prevents this divergence, removing some 
stored information inevitably, so we cannot discriminate 
every detail of the past merely by observing the present 
state. Topological distortion, therefore, induces the flow 
of information bits and successive recording of this flow 
determines more precise knowledge of the state in chaotic 
systems. Information flow is a general property of chaos 
and, for example, all hyperbolic chaotic systems are al- 
ready proven to have constant positive information rates 
by Schittenkopf and Deco 0|. This idea forms the ba- 
sis of our scheme which connects two previous methods. 
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First, we begin with fully known dynamics, just as in 
communication, and discuss later how to deal with given 
data without a priori knowledge. 

Following Rosa et al., we start with the case of 
2x mod 1 map as the simplest example of stretch-fold 
mechanism and also of our scheme. Employing binary 
representation in describing states, each iteration simply 
shifts the decimal point one space to the right. Let us 
assume that we introduce noise with such a level that 
we can guarantee only the first effective number. If the 
initial state is observed to be O.uqxx . . . and the first and 
second iteration give O.aixx . . . and 0.a2xx..., respec- 
tively, noting that digits marked by x may be spurious, 
we can say that the initial state is in fact 0. 090102 • • ■ , 
effectively reducing the noise on the initial state. 

The above example involves two conditions : the noise 
level S is known and the dynamics is chaotic. In such 
cases, we ignore the spoiled parts and that converts an 
observed point to a set of candidate points leading to 
degeneracy (e.g. all the points whose first digit is oq). 
Then we clarify what it should be by receiving informa- 
tion from other unspoiled parts of data. Roughly speak- 
ing, proper temporal extension can compensate spatial 
ambiguity If a data point Xt is observed, the real 
value Yt should lie within a finite neighborhood I{Xt), 
whose size comes from the noise level S. The next real 
value Yt+i, evolving from Yt deterministically, also be- 
longs to I{Xt+i) while it does not hold for every point 
Pt S I{Xt) and its successor, pt+i- Noting that the in- 
verse mapping operates on a set of points, not on 
a single point where the inverse map cannot be defined, 
we find the n-th order refinement, 
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In terms of the previous example, M^' {I{Xi)} with 
t = means the set of binary numbers whose i-th digit 
is Oi. As the n-th order refinement requires n + 1 suc- 
cessive measurements, it is obvious that the diameter of 
a remaining set never increases so that this algorithm is 
convergent : 
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Equation ijjjl shares similarity with Q of maximum like- 
lihood method, while Gaussian assumption is turned out 
to be unnecessary in our scheme. Once S is defined, other 
details of noise are irrelevant. It is also worth noting that 
© formalizes the basic philosophy of Rosa's method. 
The difficulty in application of it is remedied by rewriting 
Q as following : 
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and this allows one to avoid calculating inverse map- 
ping, hardly possible in high dimensional systems. We 
deduce that if a point does not belong to the set of the 
right-hand side of Q , it cannot lie in the set of the left- 
hand side. Then what has to be done is only selecting 
points within I{Xt) which satisfy the right-hand side af- 
ter n times of mapping. Henceforth, we iterate all nearby 
grid points around the observed data which approximate 
I{Xt) in a discrete manner, and reject false ones get- 
ting outside the next expected intervals, I{Xt+i). We 
repeat the same procedure only on the surviving points 
until the number of remaining ones are less than a certain 



threshold, i.e. 
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< Rth- Xt is then corrected to 



X'l^ = \^I{Xt)"^^ the average of those remaining points. 
The number of steps m, required to reach this threshold 
Rth, measures the performance of noise reduction and we 
define this quantity as abrasion time. Since each point 
has its m, we obtain another sequence of abrasion time 
{™fc}fcLi after refinement. A system with short m is so 
sensitive that wrong guesses are easily rejected, and thus 
it is easy to clean noise. Later in inferring dynamics 
without knowledge of it, we use this concept in a differ- 
ent context, that is, fast abrasion implies large deviation 
from the true dynamics. 

Figure 1 demonstrates the result of this scheme for 
Lorenz system : 



X = a{y — x) 

y — rx — y — xz 
z ~ xy — bz 



(6) 



where a — 10, r = 28 and b — 8/3. The noisy orbit 
{Xk} is generated in FIG. 1(a) by introducing noise of 
(5 « 5% of whole system size, which is enough to destroy 
most important characteristics of the attractor |il4 | . Our 
scheme corrects each point Xk into as depicted in (b), 
where 20 x 20 x 20 neighboring grid points are constructed 
for each data point and Rth is set to be 10 throughout 
this calculation. We define relative variance as 
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to quantify the performance of the scheme, where e < 1 
means that noise is reduced (e = for total noise reduc- 
tion). This demonstration yields e w 0.05, which implies 
a high point-to-point correspondence so that this scheme 
can be categorized as detailed noise reduction following 
Ref. 0. Similar results are obtained for Rossler sys- 
tem. Though Rosa et al. propose that both forward 
and backward iterations are necessary for high dimen- 
sional systems, we do not perform backward one since 
this noninvertible M lacks time reversal symmetry and 
thus information flows with only one direction. 

So far the full knowledge of dynamics has been as- 
sumed for explaining convenience. Although this as- 
sumption may be valid in some area, we need to infer 
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FIG. 1: (a) Lorenz attractor with 5% noise added and (b) 
refined data (100,000 points for each). Relative variation be- 
comes reduced to about 0.05. 



dynamics from given raw data in general. Farmer and 
Sidorowich pointed out that how much noise one can re- 
duce is Hmited by the accuracy of approximation to the 
true dynamics At first, we tried to find local linear 
dynamics as Kotelich and Yorke did 0, but it was not 
quite satisfactory since determining the size of neighbor- 
hood was troublesome, that is, too small size often de- 
creases statistical confidence and too large one could not 
capture the fine structure of the attractor. Looking for 
alternatives consistent with the above scheme, we noted 
that the true dynamics would be the most accurate ap- 
proximation among other candidate models and that our 
getting closer to the true dynamics could be expressed 
by longer m in average. 

Let us suppose that the parameter r in ©, represent- 
ing Rayleigh number in convection problem |24| . is un- 
known to us. Even if we are given the same data as FIG. 
1(a), now we should test many Lorenz systems with dif- 
ferent r values until finding r = 28. Figure 2 shows how 
the choice of r changes the distribution of {mfc}. We de- 
picted only two cases of r = 28(correct) and r = O(wrong) 
though we observed the same tendency for intermedi- 
ate values of r. The distribution looks Maxwellian in 
the vicinity of true dynamics and this Maxwellian re- 
gion can be reached by processing raw data. We present 
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FIG. 2: The distribution shapes of abrasion time in the vicin- 
ity of true dynamics. Deviation from correct r decreases abra- 
sion time in average. 



a qualitative description with a statistical moment of 
the distribution. Imposing perturbed dynamics, we see 
that abrasion time goes to zero as our guesses are re- 
jected soon by observations. The average abrasion time 
m — X^fcLi "^fc rises to 14.71 for r = 28 while be- 
comes only 5.96 for r = 0. Let us consider two extreme 
cases to elucidate basic nature of the distribution : If the 
underlying dynamics is so trivial (e.g. stable periodic 
motion) that one can easily discover it, the future orbit 
is highly predictable and the distribution will be drawn 
to infinity. As a non-chaotic system contains little infor- 
mation, our noise reduction scheme becomes ineffective 
with diverging m. Conversely, if dynamics looks totally 
unpredictable based on our knowledge, the distribution 
will collapse to zero point. We again see that noise is not 
reduced at all, since accuracy of approximation sets an 
upper bound of reducing performance, as stated above. 
Thus higher to is preferable when dynamics is unknown, 
while m divergence should be avoided when dynamics 
is known, which may seem contradictory at first. The 
balance between infinity and zero indicates a status be- 
tween regularity and randomness, or between perfect pre- 
dictability and unpredictability. In other words, to de- 
pends both on the system we observe and the information 
we have on that system. 

From the above arguments, we suggest an algorithm 
for inferring dynamics : One obtains enough time sig- 
nals, possibly including noise, and chooses appropriate 
basis functions specified by a number of parameters. Af- 
ter rough estimation of the parameters, by means of fit- 
ting and smoothing algorithms, the higher m discovered 
around those values, the better dynamics inferred. In a 
brief numerical experiment, we set (x, y, z) = M{x, y, z), 
where components of M are second-order polynomials of 
a;, y, and z with unknown coefficients and we observe that 
even a crude search can reduce noise with approaching 
the true dynamics (FIG. 3). Tests of 200 random samples 
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FIG. 3: 100,000 points after a random parameter searching. 
Compared with FIG. 1(a), the orbit becomes a little smoother 
with relative variance less than 0.7. Advanced searching al- 
gorithms are expected to yield better results. 



around our rough guess give maximum m — 5.37 (only 
about 37% of that of true dynamics), but the relative 
variance e becomes less than 0.7. Advanced parameter 
searching techniques is expected to yield desirable per- 
formance. Such error-tolerance property of fh-method is 
supposed to be due to a sort of shadowing effect : a de- 
viated parameter operates as dynamical noise since it is 
coupled to the dynamics, and an incorrect model can be 
shadowed by less dynamical-noisy orbits (i.e. with less 
deviated parameters) within some distances 0, 0] . 

In summary, we suggested a nonlinear noise reduc- 
tion scheme using ideas of information theory, which re- 
quires two conditions of chaoticity and well-defined noise 
level. Since information flow gradually reveals more pre- 
cise knowledge, it formalizes the problem into rejection of 
hypotheses instead of minimization. Topological consid- 
eration and information-theoretic analysis combined in 
our scheme provide a concise and easily applicable way 
for noise reduction. Noise was readily decreased to less 
than a twentieth for fully known Lorenz system. We in- 
troduced abrasion time and proposed its average m as 
a quantifier for inferring dynamics. This m-method was 
checked by performing noise reduction, since the accu- 
racy of this inference fundamentally sets a limit on noise 
reducing capability. It readily yielded the expected noise 
reduction. 
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